Detecting annotation noise in automatically labelled data

نویسندگان

  • Ines Rehbein
  • Josef Ruppenhofer
چکیده

We introduce a method for error detection in automatically annotated text, aimed at supporting the creation of high-quality language resources at affordable cost. Our method combines an unsupervised generative model with human supervision from active learning. We test our approach on in-domain and out-of-domain data in two languages, in AL simulations and in a real world setting. For all settings, the results show that our method is able to detect annotation errors with high precision and high recall.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

An Image-Based System for Change Detection on Tunnel Linings

We present an automated system for detecting visual changes on tunnel linings. By registering new images to a three-dimensional tunnel surface model, recovered using Structure and Motion techniques, we are able to detect and localise changes accurately in order to assist visual inspection by a human expert. We formulate the problem of detecting changes probabilistically and exploit different fe...

متن کامل

An Approach to Automatic Music Band Member Detection Based on Supervised Learning

Automatically extracting factual information about musical entities, such as detecting the members of a band, helps building advanced browsing interfaces and recommendation systems. In this paper, a supervised approach to learning to identify and to extract the members of a music band from related Web documents is proposed. While existing methods utilize manually optimized rules for this purpos...

متن کامل

Structural Damage Assessment Via Model Updating Using Augmented Grey Wolf Optimization Algorithm (AGWO)

Some civil engineering-based infrastructures are planned for the Structural Health Monitoring (SHM) system based on their importance. Identifiction and detecting damage automatically at the right time are one of the major objectives this system faces. One of the methods to meet this objective is model updating whit use of optimization algorithms in structures.This paper is aimed to evaluate the...

متن کامل

From Field Notes towards a Knowledge Base

We describe the process of converting plain text cultural heritage data to elements of a domain-specific knowledge base, using general machine learning techniques. First, digitised expedition field notes are segmented and labelled automatically. In order to obtain perfect records, we create an annotation tool that features selective sampling, allowing domain experts to validate automatically la...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017